import numpy as np
import pandas as pd
from pathlib import Path
import plotly.express as px
import plotly.graph_objs as go
import warnings
warnings.simplefilter("ignore")
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import MaxNLocatorForecasting UAE Non-Oil GDP: An Econometric Analysis of Local Indicators
Introduction
In this project, I explored the impact of localized and often underused economic indicators on the UAE’s Non-Oil GDP growth. This work was part of research conferenc ei participated in under the research paper and Reeseach poster catgeories, which won 1st place for my research poster and received a special mention in the research paper track.
The UAE has made major progress in diversifying its economy away from oil. hence it becomes important to understand what drives short-term economic performance in the non oil areas like real estate, tourism, and credit markets.
Traditional indicators like M2 supply, oil production, unemployment, or CPI often miss these localized shifts. That’s what motivated my research: to explore whether more localized and underused indicators can help better predict short-term non-oil GDP movements and provide more timely insights. This is beneficial for policymakers, businesses, and investors to help them make better-informed decisions.
While several studies have shown that models like ARIMA, VAR, and OLS can forecast UAE GDP (McCloskey & Remor, 2025), these efforts have largely relied on broad macro variables such as oil prices, inflation, or global demand. Similarly, Bentour and Fund (2022) and Cherif et al. (2011) focused on traditional macroeconomic factors, with limited attention to localized, sector-specific dynamics. El Mahmah (2017) came closest to this study by using quarterly data and including the Purchasing Managers’ Index (PMI), but still relied heavily on conventional indicators.
I will walk through the steps I took during the research process, from data acquisition to model development and evaluation. During my research poster and paper presentation, I had limited space and time to showcase the full process, so this notebook includes additional sections I couldn’t cover.
As always, I started by importing the core libraries that I frequently use. Additional libraries were imported later as needed throughout the analysis.
Data Collection & Preprocessing
The first step was acquiring the datasets and performing initial preprocessing to prepare the data for analysis. Some datasets were already available in Excel format, while others required manual entry. I prefer using Excel as my primary method of storing data since it allows for easy review, manual adjustments if any, and convenient retrieval.
The Dependent Variable
The dependent variable is the quarter-on-quarter growth rate (in %) of Non-Oil GDP at constant prices, sourced from the Central Bank of the UAE’s Quarterly Economic Review.
Primary Source: Central Bank Publications
The sample covers the period from Q1 2014 to Q2 2024, providing 42 quarterly observations. All independent variables were transformed to this same quarterly frequency to ensure proper alignment.
Alternative sources for this data include:
- UAE Quarterly National Accounts (FCSC)
- FCSC Data Portal
While multiple sources are available, it’s important to always cross-check the data and reference the primary source to ensure accuracy and validity.
# import the compiled data
ngdp = pd.read_excel(ngdp_data_path) # Use your own relevant file path, personal file path hidden for privacy purposes
ngdp.set_index('Date',inplace=True)
ngdp.index = ngdp.index.to_period('M') # Convert the date fromat to a PeriodIndex with monthly frequency (year and month)
print(ngdp.info())
ngdp.head()<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 42 entries, 2014-03 to 2024-06
Freq: M
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Non Oil GDP 42 non-null float64
dtypes: float64(1)
memory usage: 672.0 bytes
None
| Non Oil GDP | |
|---|---|
| Date | |
| 2014-03 | 5.0 |
| 2014-06 | 6.2 |
| 2014-09 | 5.1 |
| 2014-12 | 9.1 |
| 2015-03 | 7.4 |
Independent Variables
The following independent variables were selected for their potential to reflect localized economic activity that is often overlooked by traditional macroeconomic indicators.
Some variables were originally reported at different frequencies (e.g., monthly) and covered varying time periods. As part of the preprocessing workflow, each series was first transformed to a quarterly frequency to ensure consistency with the dependent variable and was later on trimmed to match the observation window of the dependent variable in future sections, ensuring full alignment across the dataset.
UAE Purchasing Managers’ Index (PMI)
PMI data was collected from Trading Economics for recent periods and supplemented with earlier records from the OPEC Monthly Oil Market Reports for 2014 and 2015.
I computed the quarterly average of the monthly PMI values to produce a single value per quarter, reflecting the overall sentiment of business activity in the UAE during that period. While I also considered using the quarter-on-quarter change, the PMI index itself is already informative, values above 50 indicate economic expansion, while values below 50 signal contraction.
Data Sources:
- Trading Economics – UAE PMI (Account required to view full data)
- OPEC Monthly Oil Market Report – 2015
- OPEC Monthly Oil Market Report – 2014
# Load the monthly PMI data
pmi = pd.read_excel(pmi_path) # Use your own relevant file path
# Set the 'Date' column as the index
pmi.set_index('Date', inplace=True)
# Convert index to monthly PeriodIndex (e.g., 2015-03)
pmi.index = pmi.index.to_period('M')
# Resample monthly data to quarterly by taking the average of each quarter
pmi_quarterly = pmi.to_timestamp().resample('Q').mean()
# Optional: Convert index back to monthly period for consistency with other series
pmi_quarterly.index = pmi_quarterly.index.to_period('M')
print(pmi_quarterly.info())
pmi_quarterly.head()<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 44 entries, 2014-03 to 2024-12
Freq: M
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PMI 44 non-null float64
dtypes: float64(1)
memory usage: 704.0 bytes
None
| PMI | |
|---|---|
| Date | |
| 2014-03 | 57.366667 |
| 2014-06 | 57.933333 |
| 2014-09 | 58.000000 |
| 2014-12 | 59.300000 |
| 2015-03 | 57.900000 |
Dubai Residential Sales Price Index
Real estate data was sourced from the Dubai Pulse Open Data Platform, based on the Mo’asher House Price Index developed by the Dubai Land Department in collaboration with Property Finder.
The index applies a hedonic regression methodology to adjust for property characteristics, allowing for meaningful comparisons over time. I calculated quarterly change using price levels from the first and last months of each quarter.
Using returns instead of average prices provides a clearer measure of price momentum and directional change, making it more suitable for time-series modeling and capturing economic signals relevant to GDP growth.
Data Source:
- Dubai Pulse – Residential Sales Price Index
# Load the monthly residential data
residential = pd.read_excel(residential_path) # Use your own relevant file path
# Set the 'Date' column as the index
residential.set_index('Date', inplace=True)
# Convert index to monthly PeriodIndex (e.g., 2015-03)
residential.index = residential.index.to_period('M')
# Convert back to timestamp for resampling
residential_ts = residential.to_timestamp()
# Get first and last month of each quarter and compute log return
quarterly_open = residential_ts.resample('Q').first()
quarterly_close = residential_ts.resample('Q').last()
# Calculate log change: ln(P_last / P_first)
residential_log = np.log(quarterly_close / quarterly_open)
# Convert index back to PeriodIndex for consistency
residential_log.index = residential_log.index.to_period('M')
print(residential_log.info())
residential_log.head()<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 44 entries, 2014-03 to 2024-12
Freq: M
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Residential Sales Index 44 non-null float64
dtypes: float64(1)
memory usage: 704.0 bytes
None
| Residential Sales Index | |
|---|---|
| Date | |
| 2014-03 | 0.026271 |
| 2014-06 | 0.031828 |
| 2014-09 | 0.021766 |
| 2014-12 | 0.019616 |
| 2015-03 | 0.003086 |
International Visitors to Dubai (Tourism)
Monthly data on international visitor arrivals was sourced from the Dubai Department of Economy and Tourism (DDET), accessed via the Dubai Pulse Open Data Platform.
To capture the evolution of tourism activity in Dubai, I first calculated quarterly totals of international visitors. I then calculated the change of these quarterly totals to reflect the percentage change in tourism inflows from one quarter to the next. This approach focuses on how tourism activity is changing over time rather than its absolute level, which is more useful for identifying growth patterns and economic turning points that may influence non-oil GDP.
Data Sources:
- Dubai Pulse – Tourism Data (2014–2023)
- Dubai DET – 2024 Tourism Reports
Important Note for 2024 Data:
The 2024 tourism data was sourced manually from the Dubai DDET webpage reports. Since it is reported cumulatively, the previous month’s total must be subtracted from the current month to obtain the monthly figure. For example:
- January visitors = 1.47 million
- January–February cumulative visitors = 3.10 million
- February visitors = 3.10 – 1.47 = 1.63 million
I manually calculated it in Excel as it was a quicker and more convenient option than performing the adjustments in Python.
# Load the monthly visitors data
visitors = pd.read_excel(visitors_path) # Use your own relevant file path
# Set the 'Date' column as the index
visitors.set_index('Date', inplace=True)
# Convert index to monthly PeriodIndex (e.g., 2015-03)
visitors.index = visitors.index.to_period('M')
# Convert back to TimestampIndex for resampling
visitors_ts = visitors.to_timestamp()
# Aggregate monthly visitor counts into quarterly totals
visitors_q = visitors_ts.resample("Q").sum()
# Replace zero visitors with small value to avoid log(0)
visitors_q['Visitors'] = visitors_q['Visitors'].replace(0, 1e-6)
# Calculate log change to get quarter-over-quarter growth in visitor inflows
visitors_log = np.log(visitors_q / visitors_q.shift(1))
# Drop the first NaN resulting from the shift
visitors_log = visitors_log.dropna()
# Convert index back to PeriodIndex (optional, for consistency)
visitors_log.index = visitors_log.index.to_period('M')
print(visitors_log.info())
# Preview the result
visitors_log.head()<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 43 entries, 2014-06 to 2024-12
Freq: M
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Visitors 43 non-null float64
dtypes: float64(1)
memory usage: 688.0 bytes
None
| Visitors | |
|---|---|
| Date | |
| 2014-06 | -0.119338 |
| 2014-09 | -0.145027 |
| 2014-12 | 0.263317 |
| 2015-03 | 0.077719 |
| 2015-06 | -0.153976 |
Personal and Business Lending (Net Balance Indices)
These indicators were sourced from the Credit Sentiment Survey published by the Central Bank of the UAE. The survey gathers responses from senior credit officers across banks and financial institutions in the UAE. The results are presented as net balance indices, representing the weighted percentage difference between respondents expecting an increase versus those expecting a decrease in loan demand.
Specifically, I used responses to the following forward-looking questions:
- Personal Lending Demand: Derived from Question 7 asked to individuals — “Over the next quarter, how do you expect demand for personal loans to change?”
- Business Lending Demand: Derived from Question 8 asked to businesses — “Over the next quarter, how do you expect demand for business loans to change?”
Each index directly captures expectations about future loan demand and is reported quarterly, so no frequency adjustments were required.
Data Source:
- Central Bank of the UAE – Publications (The Excel files contain all survey questions and corresponding indices.)
# import the compiled data
personal = pd.read_excel(personal_path) # Use your own relevant file path
personal.set_index('Date',inplace=True)
personal.index = personal.index.to_period('M') # Convert the date fromat to a PeriodIndex with monthly frequency (year and month)
print(personal.info())
personal.head()<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 44 entries, 2014-03 to 2024-12
Freq: M
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Personal Lending Demand 44 non-null float64
dtypes: float64(1)
memory usage: 704.0 bytes
None
| Personal Lending Demand | |
|---|---|
| Date | |
| 2014-03 | 35.267857 |
| 2014-06 | 15.300000 |
| 2014-09 | 14.285714 |
| 2014-12 | 22.784810 |
| 2015-03 | 6.896552 |
# import the compiled data
business = pd.read_excel(business_path) # Use your own relevant file path
business.set_index('Date',inplace=True)
business.index = business.index.to_period('M') # Convert the date fromat to a PeriodIndex with monthly frequency (year and month)
print(business.info())
business.head()<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 44 entries, 2014-03 to 2024-12
Freq: M
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Business Lending Demand 44 non-null float64
dtypes: float64(1)
memory usage: 704.0 bytes
None
| Business Lending Demand | |
|---|---|
| Date | |
| 2014-03 | 43.589744 |
| 2014-06 | 32.777778 |
| 2014-09 | 32.743363 |
| 2014-12 | 33.333333 |
| 2015-03 | 24.576271 |
Exploratory Data Analysis
In this section, I conducted a quick review of all the variables.
I first plotted each series to inspect its raw behavior trends, seasonality and any outliers. After that, I reviewed summary statistics and distribution plots (with KDE overlays) to get a quick sense of central tendency, volatility and skewness.
Below are the key takeaways for each indicator without going into exhaustive detail.
Visual Overview & Key Patterns
Non-Oil GDP Quarterly Change (%)
GDP growth oscillated around 5–7% before 2020, plunged to –9% in the COVID quarter, then sprang back above 10% briefly in late 2020. Since then, growth settled in a 3–7% band, showing a return to moderate expansion.
UAE PMI Monthly Index
The PMI is hovering comfortably above 50 from 2014 to 2019, dipped sharply below 50 during early 2020 (COVID lockdowns), and then climbed back to mid-50s in the recovery phase. Seasonality was mild, but the COVID shock stood out clearly.
Residential Sales Price Index
Home prices were flat-to-slightly up from 2014–2019, dipped modestly in 2020, then accelerated steadily from 2021 onward rising from about 1.10 to 1.68 by 2024, reflecting a robust post-COVID housing boom.
International Monthly Visitors (Millions)
Visitor arrivals showed clear seasonal peaks each year, then collapsed almost to zero in early 2020. After COVID restrictions eased, arrivals rebounded strongly past pre-pandemic levels, topping 1.5–1.8 m by 2023.
Personal Lending Demand Index
Sentiment started around the mid-teens, drifted lower into single digits by 2016–2018, then plunged into negative territory in 2020. From late 2020 onward, optimism surged back to 20–30% readings, showing households regained confidence quickly.
Business Lending Demand Index
Business sentiment followed a similar shape with high optimism in 2014, a trough in 2016, a steep drop to near zero in 2020, then a strong rebound into the mid-20s and above by 2022. The bounce-back wasn’t quite as sharp as personal lending, but still pronounced.
Overall Takeaway
All series showed a pronounced COVID-period disruption: a sharp dip followed by a synchronized rebound. Lending and PMI turned negative or below-trend in 2020, then recovered steadily. Visitor arrivals and house prices not only rebounded but overshot prior peaks, while GDP growth bounced back sharply and then normalized. This coordinated “dip-and-recover” pattern underscored the broad, economy-wide shock and subsequent stimulus-driven revival.
from plotly.subplots import make_subplots
# Dictionary of your dataframes
dataframes = {
'Non-Oil GDP Quarterly Change': ngdp,
'UAE PMI Monthly Index': pmi,
'Residential Sales Price Index': residential,
'International Monthly Visitors (In Millions)': visitors,
'Personal Lending Demand Index': personal,
'Business Lending Demand Index': business
}
# Single color for all plots
line_color = 'red'
# Create 2x3 subplot grid
fig = make_subplots(
rows=2, cols=3,
subplot_titles=list(dataframes.keys()),
horizontal_spacing=0.1,
vertical_spacing=0.15
)
# Add traces
for idx, (name, df) in enumerate(dataframes.items()):
row = idx // 3 + 1
col = idx % 3 + 1
x_values = df.index.to_timestamp() if isinstance(df.index, pd.PeriodIndex) else df.index
y_values = df.iloc[:, 0]
fig.add_trace(
go.Scatter(
x=x_values,
y=y_values,
mode="lines+markers",
line=dict(color=line_color, width=2),
marker=dict(size=4),
name=name,
showlegend=False
),
row=row, col=col
)
# Layout updates
fig.update_layout(
height=700,
width=1200, # Increased width for better spacing
paper_bgcolor='black',
plot_bgcolor='black',
font=dict(color='white'),
margin=dict(t=50)
)
# Remove gridlines and zero lines from all subplots
for i in range(1, 3): # Rows
for j in range(1, 4): # Columns
fig.update_xaxes(
row=i, col=j,
showgrid=False,
zeroline=False,
linecolor='white'
)
fig.update_yaxes(
row=i, col=j,
showgrid=False,
zeroline=False,
linecolor='white'
)
# Ensure subplot titles are white
for ann in fig['layout']['annotations']:
ann['font'] = dict(color='white')
fig.show()Summary Statistics: QoQ Non-Oil GDP Growth (%)
To get a quick understanding of the distribution of the dependent variable, I reviewed the summary statistics and visualized the data using a histogram with a KDE (Kernel Density Estimate) line.
- Mean = 3.99%: Shows that, on average, the non-oil sector experienced moderate growth during the period.
- Standard Deviation = 4.30%: Indicates noticeable fluctuations in quarterly growth rates.
- Minimum = -9.2%: The lowest recorded value, pointing to a sharp contraction during a tough quarter.
- 25th Percentile = 2.55%: A quarter of the data falls below this value, suggesting that slower growth periods were fairly common.
- Median = 5.0%: Half of the quarters recorded growth below this rate, highlighting moderate growth as the typical case.
- 75th Percentile = 6.6%: Stronger growth periods were less frequent but still present.
- Maximum = 11.2%: The highest recorded growth rate, likely tied to an exceptional recovery or expansion period.
- Interquartile Range (IQR) = 4.05%: Shows there’s a decent spread in the middle 50% of the data.
- Range = 20.4%: From the sharpest contraction (-9.2%) to the strongest expansion (11.2%), indicating high volatility.
Overall, non-oil GDP growth was quite volatile, with noticeable swings between strong growth and sharp contractions. This highlights the value of identifying early indicators to better anticipate these shifts.
ngdp.describe().T| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Non Oil GDP | 42.0 | 3.992857 | 4.295624 | -9.2 | 2.55 | 5.0 | 6.6 | 11.2 |
Additional Interpretations from the Plot
Shape of the Distribution: The distribution is moderately right-skewed, with the bulk of the values between 2.5% and 7%. This indicates that while positive growth is typical, there are a few unusually high growth rates that pull the distribution’s tail to the right.
Presence of Outliers or Extremes: The histogram shows a small number of extreme values on both ends. The lowest observed growth rate (−9.2%) is a clear outlier, possibly representing a severe external shock. On the other end, a few quarters exceeded 10%, which may reflect one-off surges in economic activity.
Outliers are identified using the 1.5 × IQR rule:
\[ \text{Lower bound} = Q_1 - 1.5 \times \text{IQR} = 2.55 - 1.5 \times 4.05 = -3.525 \]
\[ \text{Upper bound} = Q_3 + 1.5 \times \text{IQR} = 6.6 + 1.5 \times 4.05 = 12.675 \]
Volatility and Economic Cycles: The plot confirms again that GDP growth values are widely spread, indicating higher volatility in the non-oil sector.
from scipy.stats import gaussian_kde
# Prepare ngdp_data
ngdp_data = ngdp["Non Oil GDP"]
# KDE estimation
kde = gaussian_kde(ngdp_data)
x_vals = np.linspace(min(ngdp_data), max(ngdp_data), 200)
y_vals = kde(x_vals)
# Create figure
fig = go.Figure()
# Add histogram
fig.add_trace(go.Histogram(
x=ngdp_data,
nbinsx=15,
marker_color='red',
opacity=0.6,
name='Histogram (Count)'
))
# Scale KDE line to match histogram count scale
# Multiply by bin width and total count to match scale
bin_width = (max(ngdp_data) - min(ngdp_data)) / 15
scaled_y_vals = y_vals * len(ngdp_data) * bin_width
# Add KDE line (scaled)
fig.add_trace(go.Scatter(
x=x_vals,
y=scaled_y_vals,
mode='lines',
line=dict(color='white', width=2),
name='KDE (scaled to count)'
))
# Update layout
fig.update_layout(
title="Distribution of QoQ Non-Oil GDP Growth (%)",
title_x=0.5,
template="plotly_dark",
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
xaxis=dict(showgrid=False),
yaxis=dict(showgrid=False),
bargap=0.05,
width=1300,
height=600
)
fig.show()Summary Statistics: UAE Purchasing Managers Index (PMI)
Monthly Index Levels
The average monthly PMI is about 54.76, with most readings between 53.8 and 56.6. A trough as low as 44.1 and a peak at 61.2 show occasional soft patches and strong expansion periods. The standard deviation of 2.74 indicates moderate month-to-month swings.
Distribution Plot
Most months cluster around 55, and the KDE curve tilts slightly to the left, those deeper dips below 50 are rarer but stretch the left tail.
Quarterly Index Levels
At the quarterly frequency, the PMI still averages 54.76, with half of the quarters between 53.85 and 56.25. The softest quarter sits at 47.07, while the strongest hits 59.3. A lower standard deviation of 2.62 reflects that aggregating to quarters slightly smooths out some monthly noise.
Distribution Plot
Quarterly values remain centered near 55, and the KDE again shows a mild left skew as few softer quarters pull the tail on the downside.
pmi.describe().T| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| PMI | 132.0 | 54.759091 | 2.74393 | 44.1 | 53.8 | 55.1 | 56.6 | 61.2 |
pmi_quarterly.describe().T| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| PMI | 44.0 | 54.759091 | 2.624363 | 47.066667 | 53.85 | 55.1 | 56.25 | 59.3 |
# Prepare datasets
pmi_data = pmi["PMI"]
pmi_quarterly_data = pmi_quarterly["PMI"]
# Setup subplot layout
fig = make_subplots(
rows=1, cols=2,
subplot_titles=["PMI Monthly Index", "PMI Quarterly Index"]
)
# Colors and common settings
colors = ['red', 'white']
nbins = 15
# --- Plot Original Data ---
kde = gaussian_kde(pmi_data)
x_vals = np.linspace(min(pmi_data), max(pmi_data), 200)
y_vals = kde(x_vals)
bin_width = (max(pmi_data) - min(pmi_data)) / nbins
scaled_y_vals = y_vals * len(pmi_data) * bin_width
fig.add_trace(go.Histogram(
x=pmi_data,
nbinsx=nbins,
marker_color='red',
opacity=0.6,
name='Original Histogram',
showlegend=False
), row=1, col=1)
fig.add_trace(go.Scatter(
x=x_vals,
y=scaled_y_vals,
mode='lines',
line=dict(color='white', width=2),
name='KDE (Original)',
showlegend=False
), row=1, col=1)
# --- Plot Log-Transformed Data ---
kde_log = gaussian_kde(pmi_quarterly_data)
x_vals_log = np.linspace(min(pmi_quarterly_data), max(pmi_quarterly_data), 200)
y_vals_log = kde_log(x_vals_log)
bin_width_log = (max(pmi_quarterly_data) - min(pmi_quarterly_data)) / nbins
scaled_y_vals_log = y_vals_log * len(pmi_quarterly_data) * bin_width_log
fig.add_trace(go.Histogram(
x=pmi_quarterly_data,
nbinsx=nbins,
marker_color='red',
opacity=0.6,
name='Log-Transformed Histogram',
showlegend=False
), row=1, col=2)
fig.add_trace(go.Scatter(
x=x_vals_log,
y=scaled_y_vals_log,
mode='lines',
line=dict(color='white', width=2),
name='KDE (Log-Transformed)',
showlegend=False
), row=1, col=2)
# Layout Styling
fig.update_layout(
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
width=1300,
height=600,
bargap=0.05
)
# Hide gridlines
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False,zeroline=False)
fig.show()Summary Statistics: Dubai Residential Sales Price Index
Monthly Index Levels
The average index level is about 1.28, with most values between 1.15 and 1.30. A few higher values (up to 1.68) point to occasional price surges in the housing market. The standard deviation of 0.16 indicates moderate month-to-month variability.
Distribution Plot
Most months cluster around the mean, and the KDE curve reveals a slight right tail as those price jumps are less frequent but noticeable.
Quarterly Log Returns
Average quarterly return is small but positive (0.6%), suggesting a gentle upward trend overall. Returns are fairly symmetric around zero, with occasional spikes (~ 4%) and dips (~ –2%). A standard deviation of about 1.7% shows quarterly volatility exceeds month-to-month variability.
Distribution Plot
Quarterly changes mostly hover near zero, with a bell-shaped KDE and slightly heavy tails reflecting those rare big jumps or drops in residential prices.
residential.describe().T | count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Residential Sales Index | 132.0 | 1.277955 | 0.161465 | 1.07 | 1.15525 | 1.255 | 1.29875 | 1.684 |
residential_log.describe().T | count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Residential Sales Index | 44.0 | 0.006272 | 0.016799 | -0.021142 | -0.00485 | 0.001935 | 0.019999 | 0.042525 |
# Prepare datasets
residential_data = residential["Residential Sales Index"]
residential_log_data = residential_log["Residential Sales Index"]
# Setup subplot layout
fig = make_subplots(
rows=1, cols=2,
subplot_titles=["Residential Sales Index", "Quarterly Change"]
)
# Colors and common settings
colors = ['red', 'white']
nbins = 15
# --- Plot Original Data ---
kde = gaussian_kde(residential_data)
x_vals = np.linspace(min(residential_data), max(residential_data), 200)
y_vals = kde(x_vals)
bin_width = (max(residential_data) - min(residential_data)) / nbins
scaled_y_vals = y_vals * len(residential_data) * bin_width
fig.add_trace(go.Histogram(
x=residential_data,
nbinsx=nbins,
marker_color='red',
opacity=0.6,
name='Original Histogram',
showlegend=False
), row=1, col=1)
fig.add_trace(go.Scatter(
x=x_vals,
y=scaled_y_vals,
mode='lines',
line=dict(color='white', width=2),
name='KDE (Original)',
showlegend=False
), row=1, col=1)
# --- Plot Log-Transformed Data ---
kde_log = gaussian_kde(residential_log_data)
x_vals_log = np.linspace(min(residential_log_data), max(residential_log_data), 200)
y_vals_log = kde_log(x_vals_log)
bin_width_log = (max(residential_log_data) - min(residential_log_data)) / nbins
scaled_y_vals_log = y_vals_log * len(residential_log_data) * bin_width_log
fig.add_trace(go.Histogram(
x=residential_log_data,
nbinsx=nbins,
marker_color='red',
opacity=0.6,
name='Log-Transformed Histogram',
showlegend=False
), row=1, col=2)
fig.add_trace(go.Scatter(
x=x_vals_log,
y=scaled_y_vals_log,
mode='lines',
line=dict(color='white', width=2),
name='KDE (Log-Transformed)',
showlegend=False
), row=1, col=2)
# Layout Styling
fig.update_layout(
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
width=1300,
height=600,
bargap=0.05
)
# Hide gridlines
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False,zeroline=False)
fig.show()Summary Statistics: Monthly International Visitors to Dubai (In Millions)
As seen earlier in the time series plots, For the international visitors data, values dropped to almost zero during the COVID-19 period and here we can see that these extreme outliers resulted in unusually sharp changes when tourism rebounded post-COVID. Hence, I removed those periods from the tourism data.
Note: For the other indicators, although they were affected by the economic downturn, the data still contained valuable information about underlying economic activity, so I chose to retain them.
visitors.describe().T # Figures are in millions| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Visitors | 132.0 | 1.164701 | 0.42191 | 0.0 | 1.027895 | 1.224328 | 1.44064 | 1.93 |
visitors_log.describe().T | count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Visitors | 43.0 | 0.009462 | 3.088058 | -15.159777 | -0.132182 | 0.063523 | 0.155066 | 12.940139 |
# Prepare datasets
visitors_data = visitors["Visitors"]
visitors_log_data = visitors_log["Visitors"]
# Setup subplot layout
fig = make_subplots(
rows=1, cols=2,
subplot_titles=["Dubai International visitors", "Quarterly Change"]
)
# Colors and common settings
colors = ['red', 'white']
nbins = 15
# --- Plot Original Data ---
kde = gaussian_kde(visitors_data)
x_vals = np.linspace(min(visitors_data), max(visitors_data), 200)
y_vals = kde(x_vals)
bin_width = (max(visitors_data) - min(visitors_data)) / nbins
scaled_y_vals = y_vals * len(visitors_data) * bin_width
fig.add_trace(go.Histogram(
x=visitors_data,
nbinsx=nbins,
marker_color='red',
opacity=0.6,
name='Original Histogram',
showlegend=False
), row=1, col=1)
fig.add_trace(go.Scatter(
x=x_vals,
y=scaled_y_vals,
mode='lines',
line=dict(color='white', width=2),
name='KDE (Original)',
showlegend=False
), row=1, col=1)
# --- Plot Log-Transformed Data ---
kde_log = gaussian_kde(visitors_log_data)
x_vals_log = np.linspace(min(visitors_log_data), max(visitors_log_data), 200)
y_vals_log = kde_log(x_vals_log)
bin_width_log = (max(visitors_log_data) - min(visitors_log_data)) / nbins
scaled_y_vals_log = y_vals_log * len(visitors_log_data) * bin_width_log
fig.add_trace(go.Histogram(
x=visitors_log_data,
nbinsx=nbins,
marker_color='red',
opacity=0.6,
name='Log-Transformed Histogram',
showlegend=False
), row=1, col=2)
fig.add_trace(go.Scatter(
x=x_vals_log,
y=scaled_y_vals_log,
mode='lines',
line=dict(color='white', width=2),
name='KDE (Log-Transformed)',
showlegend=False
), row=1, col=2)
# Layout Styling
fig.update_layout(
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
width=1300,
height=600,
bargap=0.05
)
# Hide gridlines
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False,zeroline=False)
fig.show()# Copy original data to avoid modifying the original
visitors_covid_excluded = visitors.copy()
visitors_log_covid_excluded = visitors_log.copy()
# Drop the specific outliers
visitors_covid_excluded = visitors_covid_excluded.drop(['2020-04','2020-05','2020-06','2020-07'])
visitors_log_covid_excluded = visitors_log_covid_excluded.drop(['2020-06', '2020-09'])Quarterly Totals (Millions)
The average quarterly arrivals are about 1.20 m, with most values between 1.06 m and 1.45 m. The busiest quarter reaches 1.93 m, while the slowest dips to 0.13 m. With a standard deviation of 0.37 m, we see moderate swings, reflecting peak tourist seasons and quieter months.
Distribution Plot
The distribution shows a slight left skew, since those unusually low‐inflow quarters (off-peak or residual COVID effects) are less common but pull the left tail. Most quarters cluster around 1.2 m, matching typical demand. The KDE curve’s peak sits near the median, and its gentle lean to the left highlights those few low-visitor outliers against an otherwise consistent inflow.
Quarterly Log Returns
Average quarterly growth is 6.4%, indicating healthy tourism recovery trends. Half the changes lie between –11.9% and 14.5%, but volatility is high (std ≈ 27.8%), while a few quarters see dramatic rebounds (up to 110%)
Distribution Plot
The change mostly hover near zero, showing small quarter-to-quarter changes are typical. The KDE curve is clearly right-skewed: modest growth rates dominate, but occasional sharp rebounds stretch out the right tail even when covid years were excluded.
visitors_covid_excluded.describe().T# Figures are in millions| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Visitors | 128.0 | 1.200785 | 0.374566 | 0.134203 | 1.063183 | 1.231606 | 1.4476 | 1.93 |
visitors_log_covid_excluded.describe().T | count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Visitors | 41.0 | 0.064061 | 0.277754 | -0.299372 | -0.119338 | 0.063523 | 0.145466 | 1.101037 |
# Prepare datasets
visitors_covid_excluded = visitors_covid_excluded["Visitors"]
visitors_log_covid_excluded = visitors_log_covid_excluded["Visitors"]
# Setup subplot layout
fig = make_subplots(
rows=1, cols=2,
subplot_titles=["Dubai International visitors", "Quarterly Chaneg"]
)
# Colors and common settings
colors = ['red', 'white']
nbins = 15
# --- Plot Original Data ---
kde = gaussian_kde(visitors_covid_excluded)
x_vals = np.linspace(min(visitors_covid_excluded), max(visitors_covid_excluded), 200)
y_vals = kde(x_vals)
bin_width = (max(visitors_covid_excluded) - min(visitors_covid_excluded)) / nbins
scaled_y_vals = y_vals * len(visitors_covid_excluded) * bin_width
fig.add_trace(go.Histogram(
x=visitors_covid_excluded,
nbinsx=nbins,
marker_color='red',
opacity=0.6,
name='Original Histogram',
showlegend=False
), row=1, col=1)
fig.add_trace(go.Scatter(
x=x_vals,
y=scaled_y_vals,
mode='lines',
line=dict(color='white', width=2),
name='KDE (Original)',
showlegend=False
), row=1, col=1)
# --- Plot Log-Transformed Data ---
kde_log = gaussian_kde(visitors_log_covid_excluded)
x_vals_log = np.linspace(min(visitors_log_covid_excluded), max(visitors_log_covid_excluded), 200)
y_vals_log = kde_log(x_vals_log)
bin_width_log = (max(visitors_log_covid_excluded) - min(visitors_log_covid_excluded)) / nbins
scaled_y_vals_log = y_vals_log * len(visitors_log_covid_excluded) * bin_width_log
fig.add_trace(go.Histogram(
x=visitors_log_covid_excluded,
nbinsx=nbins,
marker_color='red',
opacity=0.6,
name='Log-Transformed Histogram',
showlegend=False
), row=1, col=2)
fig.add_trace(go.Scatter(
x=x_vals_log,
y=scaled_y_vals_log,
mode='lines',
line=dict(color='white', width=2),
name='KDE (Log-Transformed)',
showlegend=False
), row=1, col=2)
# Layout Styling
fig.update_layout(
title= 'Excluding COVID Period',
title_x=0.5,
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
width=1300,
height=600,
bargap=0.05
)
# Hide gridlines
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False,zeroline=False)
fig.show()Summary Statistics: Personal Lending Index
Key Insights
The average net-balance reading is about 14.7%, with half of respondents falling between 7.2% and 23.9%. A low of –9% shows a few officers expect a pullback, while a high of 35.3% signals strong optimism. The standard deviation of 10.9% points to moderate quarter-to-quarter swings in sentiment.
Distribution Plot
The distribution is right-skewed, with most readings between 5 % and 25 % and a long tail toward higher values—occasional strong optimism spikes stretch the right side. The KDE curve also tilts right, reinforcing that while big positive jumps are rarer, they pull the tail out on the high end.
personal.describe().T| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Personal Lending Demand | 44.0 | 14.69847 | 10.912432 | -8.988764 | 7.201666 | 13.868826 | 23.882286 | 35.267857 |
# Prepare personal_data
personal_data = personal["Personal Lending Demand"]
# KDE estimation
kde = gaussian_kde(personal_data)
x_vals = np.linspace(min(personal_data), max(personal_data), 200)
y_vals = kde(x_vals)
# Create figure
fig = go.Figure()
# Add histogram
fig.add_trace(go.Histogram(
x=personal_data,
nbinsx=15,
marker_color='red',
opacity=0.6,
name='Histogram'
))
# Scale KDE line to match histogram count scale
# Multiply by bin width and total count to match scale
bin_width = (max(personal_data) - min(personal_data)) / 15
scaled_y_vals = y_vals * len(personal_data) * bin_width
# Add KDE line (scaled)
fig.add_trace(go.Scatter(
x=x_vals,
y=scaled_y_vals,
mode='lines',
line=dict(color='white', width=2),
name='KDE'
))
# Update layout
fig.update_layout(
template="plotly_dark",
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
xaxis=dict(showgrid=False),
yaxis=dict(showgrid=False),
xaxis_title="Personal Lending Demand Index (%)",
yaxis_title="Frequency",
bargap=0.05,
width=1300,
height=600
)
fig.show()Summary Statistics: Business Lending Index
The average net‐balance reading sits at 21.9%, with half of the quarters between 15.0% and 28.6%. Sentiment never falls below 2.2%, and peaks at 43.6%, showing occasional bursts of strong business lending optimism. A standard deviation of 9.17% indicates moderate quarter-to-quarter swings.
Distribution Plot
The histogram is right-skewed, with most values clustering in the 15–30% range and a long tail toward higher readings. The KDE curve also leans right, highlighting that while very high optimism quarters are less common, they pull the tail out on the high end.
business.describe().T| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Business Lending Demand | 44.0 | 21.905948 | 9.174431 | 2.173913 | 15.038371 | 22.488558 | 28.613401 | 43.589744 |
# Prepare business_data
business_data = business["Business Lending Demand"]
# KDE estimation
kde = gaussian_kde(business_data)
x_vals = np.linspace(min(business_data), max(business_data), 200)
y_vals = kde(x_vals)
# Create figure
fig = go.Figure()
# Add histogram
fig.add_trace(go.Histogram(
x=business_data,
nbinsx=15,
marker_color='red',
opacity=0.6,
name='Histogram'
))
# Scale KDE line to match histogram count scale
# Multiply by bin width and total count to match scale
bin_width = (max(business_data) - min(business_data)) / 15
scaled_y_vals = y_vals * len(business_data) * bin_width
# Add KDE line (scaled)
fig.add_trace(go.Scatter(
x=x_vals,
y=scaled_y_vals,
mode='lines',
line=dict(color='white', width=2),
name='KDE'
))
# Update layout
fig.update_layout(
template="plotly_dark",
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
xaxis=dict(showgrid=False),
yaxis=dict(showgrid=False),
xaxis_title="Business Lending Demand Index (%)",
yaxis_title="Frequency",
bargap=0.05,
width=1300,
height=600
)
fig.show()Correlation Analysis
To assess the predictive potential of each indicator, I conducted a lead-lag correlation analysis between the independent variables and future values of non-oil GDP growth. I shifted the dependent variable (non-oil GDP) forward by up to four quarters, and then computed its correlation with the current values of each predictor.
The goal was to identify whether changes in indicators such as lending, PMI, or real estate activity are associated with GDP growth one or more quarters ahead. For example, if real estate prices rises today, its effect on output may take time to materialize so a strong correlation with GDP in later quarters (e.g., Q+1 or Q+2) may suggest its usefulness as a short-term leading indicator.
This methodology is consistent with macroeconomic forecasting literature, where multi-quarter lags are often utilized to capture delayed effects and dynamic relationships between variables (Hann et al., 2017)
To ensure accurate correlation analysis, all independent variables were joined with the lagged versions of non-oil GDP using their shared quarterly date index.
Since some series have different starting points and may include missing values (especially after applying time shifts), we applied .dropna() to remove any rows containing NaN values. This ensures that the correlation coefficients are computed over consistent and complete samples across all lag levels.
The resulting datasets include: - ngdp_df_lag1 (GDP shifted one quarter ahead) - ngdp_df_lag2, ngdp_df_lag3, ngdp_df_lag4 (up to four quarters ahead)
# Create leaded versions of GDP by shifting it backward (-) in time
ngdp_lag1 = ngdp.shift(-1) # GDP one quarter ahead
ngdp_lag2 = ngdp.shift(-2) # GDP two quarters ahead
ngdp_lag3 = ngdp.shift(-3) # GDP three quarters ahead
ngdp_lag4 = ngdp.shift(-4) # GDP four quarters ahead# Base list of predictors
predictors = [
residential_log,
visitors_log,
personal,
business,
pmi_quarterly
]
# Join with current GDP
ngdp_df = ngdp.join(predictors).dropna()
# Join with lagged GDPs
ngdp_df_lag1 = ngdp_lag1.join(predictors).dropna()
ngdp_df_lag2 = ngdp_lag2.join(predictors).dropna()
ngdp_df_lag3 = ngdp_lag3.join(predictors).dropna()
ngdp_df_lag4 = ngdp_lag4.join(predictors).dropna()Correlation Results
After aligning and cleaning the datasets, I calculated the Pearson correlation between each independent variable and the lagged versions of non-oil GDP (from 1 to 4 quarters ahead).
The analysis showed that most indicators had their strongest relationship with non-oil GDP at a one-quarter lead, while correlations at longer lags were generally weak. As it seems to point out that changes in business sentiment, lending activity, and property markets has an effect within the next quarter, while their impact tends to fade over longer horizons probably due to other intervening economic factors.
Personal Lending Index, Business Lending Index, Residential Sales Price Index, and the UAE Manufacturing PMI all showed moderate positive correlations around 0.55 to 0.6 at lag 1, while the International Visitor showed weak correlations across all lag periods
# Compute correlation matrices
corr = ngdp_df.corr()
corr_lag1 = ngdp_df_lag1.corr()
corr_lag2 = ngdp_df_lag2.corr()
corr_lag3 = ngdp_df_lag3.corr()
corr_lag4 = ngdp_df_lag4.corr()
# Display sorted correlations with future GDP
for i, c in enumerate([corr_lag1, corr_lag2, corr_lag3, corr_lag4], start=1):
print(f"\nLag {i} Correlations with Future Non-Oil GDP:")
print(c['Non Oil GDP'].sort_values())
Lag 1 Correlations with Future Non-Oil GDP:
Visitors 0.043703
PMI 0.515123
Residential Sales Index 0.557546
Personal Lending Demand 0.563303
Business Lending Demand 0.591599
Non Oil GDP 1.000000
Name: Non Oil GDP, dtype: float64
Lag 2 Correlations with Future Non-Oil GDP:
Visitors 0.158510
Residential Sales Index 0.393571
PMI 0.401689
Personal Lending Demand 0.476747
Business Lending Demand 0.509697
Non Oil GDP 1.000000
Name: Non Oil GDP, dtype: float64
Lag 3 Correlations with Future Non-Oil GDP:
PMI 0.199294
Residential Sales Index 0.275676
Visitors 0.352320
Personal Lending Demand 0.377320
Business Lending Demand 0.409587
Non Oil GDP 1.000000
Name: Non Oil GDP, dtype: float64
Lag 4 Correlations with Future Non-Oil GDP:
PMI -0.174258
Visitors -0.087400
Personal Lending Demand 0.162761
Business Lending Demand 0.224950
Residential Sales Index 0.280447
Non Oil GDP 1.000000
Name: Non Oil GDP, dtype: float64
fig = px.imshow(corr_lag1, color_continuous_scale='inferno')
fig.update_layout(
width=800,
height=600,
template="plotly_dark",
title="The Correlation Coefficient Of the Variables: Lag 1",
title_x=0.5
)
fig.show()Multiple Linear Regression (OLS)
To evaluate the joint impact of multiple localized macroeconomic indicators on non-oil GDP growth, I estimated a Multiple Linear Regression model using data from the Lag 1 setup, where the independent variables are current, and the dependent variable is GDP one quarter ahead as this setup showed the strongest relationships.
The model was estimated using the Ordinary Least Squares (OLS) method. OLS is a linear estimation technique that minimizes the sum of squared differences between observed and predicted values of the dependent variable. It’s a widely used approach in econometric studies (Wooldridge, 2013).
I’ve used this model before in a Bitcoin-USD Price Prediction Study, where I explored how indicators like moving averages, trading volume, and external factors such as gold prices if they could help predict Bitcoin prices.
Model Results
\[ \text{NonOilGDP}_{t+1} = \beta_0 + \beta_1 \cdot \text{Residential}_{t} + \beta_2 \cdot \text{PersonalLending}_{t} + \beta_3 \cdot \text{BusinessLending}_{t} + \beta_4 \cdot \text{PMI}_{t} + \beta_5 \cdot \text{Visitors}_{t} + \varepsilon_t \]
\[ \text{NonOilGDP}_{t+1} = -28.48 + 43.79 \cdot \text{Residential}_{t} + 0.13 \cdot \text{PersonalLending}_{t} + 0.04 \cdot \text{BusinessLending}_{t} + 0.54 \cdot \text{PMI}_{t} - 0.11 \cdot \text{Visitors}_{t} + \varepsilon_t \]
# Import regression and evaluation tools
import statsmodels.api as sm
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Define predictors and target from Lag 1 dataset
X_train = ngdp_df_lag1[['Residential Sales Index',
'Personal Lending Demand',
'Business Lending Demand',
'PMI',
'Visitors']] # Independent variables
y_train = ngdp_df_lag1['Non Oil GDP'] # Dependent variable (1 quarter ahead)
# Add a constant (intercept term) to the regression model
X_train = sm.add_constant(X_train)
# Fit the OLS model
ols_model = sm.OLS(y_train, X_train).fit()
# Display model summary
ols_model.summary()| Dep. Variable: | Non Oil GDP | R-squared: | 0.484 |
| Model: | OLS | Adj. R-squared: | 0.408 |
| Method: | Least Squares | F-statistic: | 6.383 |
| Date: | Thu, 15 May 2025 | Prob (F-statistic): | 0.000279 |
| Time: | 15:08:39 | Log-Likelihood: | -102.15 |
| No. Observations: | 40 | AIC: | 216.3 |
| Df Residuals: | 34 | BIC: | 226.4 |
| Df Model: | 5 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
| const | -28.4825 | 12.153 | -2.344 | 0.025 | -53.181 | -3.784 |
| Residential Sales Index | 43.7870 | 49.551 | 0.884 | 0.383 | -56.913 | 144.487 |
| Personal Lending Demand | 0.1268 | 0.086 | 1.473 | 0.150 | -0.048 | 0.302 |
| Business Lending Demand | 0.0433 | 0.122 | 0.355 | 0.725 | -0.205 | 0.292 |
| PMI | 0.5400 | 0.233 | 2.321 | 0.026 | 0.067 | 1.013 |
| Visitors | -0.1070 | 0.182 | -0.587 | 0.561 | -0.478 | 0.263 |
| Omnibus: | 1.417 | Durbin-Watson: | 0.859 |
| Prob(Omnibus): | 0.492 | Jarque-Bera (JB): | 1.381 |
| Skew: | 0.399 | Prob(JB): | 0.501 |
| Kurtosis: | 2.563 | Cond. No. | 5.62e+03 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.62e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
Key Insights:
UAE Manufacturing PMI was statistically significant at the 5% level (p = 0.026), with a positive coefficient (0.540). This suggests that stronger business sentiment is associated with higher non-oil GDP growth in the subsequent quarter.
Residential Sales Price Index, Personal Lending Demand, and Business Lending Demand were not statistically significant (p-values > 0.10), although their positive coefficients are directionally consistent with economic expectations.
Monthly Visitor Numbers had a small negative coefficient (-0.107) and was statistically insignificant (p = 0.561), suggesting it does not meaningfully contribute to short-term GDP prediction at Lag 1.
Model Diagnostics:
The F-statistic = 6.383 (p < 0.001) confirms that the independent variables are jointly significant predictors of non-oil GDP growth.
Residual normality is supported by the Jarque-Bera test (p = 0.501), indicating an approximately normal distribution.
The Durbin-Watson statistic = 0.859 points to positive autocorrelation in the residuals, which may violate the OLS assumption of independence. Which also justifies the use of the VAR model in the coming sections since it accounts for autocorrelation and models how variables interact over time
Model Evaluation
To assess the predictive performance of the OLS model, I evaluated its in-sample predictions using four standard regression metrics:
- R-squared (R²) = 0.48: The model explains approximately 48% of the variance in non-oil GDP growth, indicating a moderate fit.
- Mean Absolute Error (MAE) = 2.54: On average, the model’s predictions differ from actual values by about 2.54 percentage points.
- Root Mean Squared Error (RMSE) = 3.11: The RMSE penalizes larger errors more heavily than MAE and reflects the standard deviation of residuals.
- Mean Absolute Percentage Error (MAPE) = 124.5%: This high value suggests the model performs poorly for smaller or near-zero GDP growth rates, as percentage errors become exaggerated. MAPE can be misleading when actual values are close to zero.
Overall, while the model captures general trends (as seen in the R²), its relatively high MAPE and RMSE highlight potential limitations in forecasting precision — especially during volatile quarters. This supports the case for exploring dynamic models like VAR, which can could probably better handle time dependencies.
# Predict using the fitted OLS model on training data
ols_predictions = ols_model.predict(X_train)
# Evaluate model performance using standard metrics
ols_mae = mean_absolute_error(y_train, ols_predictions) # Average magnitude of errors
ols_rmse = mean_squared_error(y_train, ols_predictions, squared=False) # Penalizes larger errors
ols_mape = (np.abs((y_train - ols_predictions) / y_train).mean()) * 100 # Relative percentage error
ols_r2 = ols_model.rsquared # Proportion of variance explained by the model
print(f"Mean Absolute Error (MAE): {ols_mae:.2f}")
print(f"Root Mean Squared Error (RMSE): {ols_rmse:.2f}")
print(f"Mean Absolute Percentage Error (MAPE): {ols_mape:.1f}%")
print(f"R-squared (R²): {ols_r2:.3f}")
Mean Absolute Error (MAE): 2.54
Root Mean Squared Error (RMSE): 3.11
Mean Absolute Percentage Error (MAPE): 124.5%
R-squared (R²): 0.484
From the chart below, we can see that the model generally follows the same trend as the actual GDP growth, it gets the overall direction right most of the time.
However, it misses the big drop and rebound in 2020, likely due to the unexpected impact of COVID-19. The model also tends to smooth out sharp changes, so it doesn’t fully capture sudden spikes or dips.
That said, during more stable periods (like 2016–2019), the predictions are closer to the actual values, showing the model performs better when the economy is more steady.
# If needed, generate quarterly dates
date_index = pd.date_range(start='2014-01-01', periods=len(y_train), freq='Q')
# Create figure
fig = go.Figure()
# Actual GDP
fig.add_trace(go.Scatter(
x=date_index, y=y_train,
mode='lines+markers',
name='Actual GDP Growth',
line=dict(color='#FF4136', width=2), # Bright red
marker=dict(color='#FF4136', size=5)
))
# Predicted GDP
fig.add_trace(go.Scatter(
x=date_index, y=ols_predictions,
mode='lines+markers',
name='Predicted GDP Growth (OLS)',
line=dict(color='#0074D9', width=2), # Bright blue
marker=dict(color='#0074D9', size=5)
))
# Shaded error area
fig.add_trace(go.Scatter(
x=np.concatenate([date_index, date_index[::-1]]),
y=np.concatenate([y_train, ols_predictions[::-1]]),
fill='toself',
fillcolor='rgba(173, 216, 230, 0.3)', # LightBlue with transparency
line=dict(color='rgba(255,255,255,0)'),
hoverinfo='skip',
name='Prediction Error',
showlegend=True,
))
# Layout styling (black background, white text)
fig.update_layout(
height=500, width=900,
margin=dict(l=50, r=40, t=40, b=40),
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
xaxis_title='Date',
yaxis_title='QoQ Non-Oil GDP Growth (%)',
legend=dict(orientation='h', y=1.05, x=0.5, xanchor='center', yanchor='bottom')
)
# Axes
fig.update_yaxes(showgrid=False, zeroline=False, gridcolor='lightgray', dtick=2)
fig.update_xaxes(showgrid=False)
fig.show()Stationarity Check
In the correlation and OLS regression analysis helped me uncover static relationships between GDP and other indicators. But from the results, it was clear that these methods don’t fully capture the dynamic and interconnected nature of macroeconomic variables over time. To add to that there was also presence of autocorrelation in the residuals.
To address this, I moved to a Vector Autoregression (VAR) model. Unlike OLS, VAR can account for lagged interactions between multiple variables and can also generate impulse response functions which very useful in measuring the economic shocks.
Before applying VAR or running Granger causality tests, it was important to check whether the variables were stationary, meaning their statistical properties (like mean and variance) don’t change over time. If they weren’t, I had to difference them to stabilize the series.
These steps are standard in macroeconomic modeling and are commonly used in analyses including those involving GCC economies, as shown in studies by Bentour & Fund (2022), Magazzino (2016), and Kireyev (2000)
Initially, some variables such as PMI, residential prices, and international visitors were transformed when I retrieved them in the very first step. These transformations typically help stabilize the mean and variance over time and could potentially have made them stationary.
However, to formally assess this, we can apply statistical tests such as the Augmented Dickey-Fuller (ADF) test, which helps determine the presence of a unit root and whether a series is stationary.
At its core, the ADF test checks whether the data has a kind of “memory.”
Think of it like this: imagine a drunk man walking. If he stumbles randomly in any direction with no tendency to return to where he started, he’ll likely drift further and further away over time — just like a non-stationary series that doesn’t settle around a stable average. But if there’s a rope tied to his waist pulling him gently back toward a lamppost, he’ll wander, but not too far — this is like a stationary series that fluctuates around a long-term mean.
The ADF test essentially asks: Is there a “pull-back” force in the data, or is it just wandering off forever?
(Credit to ChatGPT for making statistics feel like storytelling.)
Stationarity Test Results
Using the Augmented Dickey-Fuller (ADF) test, I found that: - Personal Lending, Business Lending, and Visitor Inflows were already stationary (p-values < 0.05), so they didn’t require any transformation. - Non-Oil GDP, PMI, and Residential Sales Index were non-stationary in their level form.
pmi_quarterly.info()<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 44 entries, 2014-03 to 2024-12
Freq: M
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PMI 44 non-null float64
dtypes: float64(1)
memory usage: 1.7 KB
from statsmodels.tsa.stattools import adfuller
# Created a dictionary mapping names the DataFrames
dataframes = {
"Non Oil GDP": ngdp,
"PMI": pmi_quarterly,
"Personal Lending": personal,
"Business Lending": business,
"Residential Sales Index": residential_log,
"Monthly Visitors": visitors_log,
}
def adf_test(series, signif=0.05, name=''):
# Remove missing values
series = series.dropna()
print(f"ADF Test for {name}:")
# Perform ADF test with automatic lag selection via AIC
result = adfuller(series, autolag='AIC')
# Extract test statistic, p-value, and critical values
test_statistic, p_value, usedlag, n_obs, crit_values, icbest = result
print(f" Test Statistic : {test_statistic:.4f}")
print(f" p-value : {p_value:.4f}")
print(f" # Lags Used : {usedlag}")
print(f" # Observations : {n_obs}")
for key, value in crit_values.items():
print(f" Critical Value ({key}) : {value:.4f}")
# Interpretation: if p-value is less than significance, we reject H0
if p_value <= signif:
print(" => Reject the null hypothesis. The series is stationary.")
else:
print(" => Fail to reject the null hypothesis. The series is non-stationary.")
print("-" * 60)
# Loop through each series in the dictionary and apply the ADF test.
for name, df in dataframes.items():
# Assuming the first (and only) column is your time series data.
adf_test(df.iloc[:, 0], name=name)ADF Test for Non Oil GDP:
Test Statistic : -1.6482
p-value : 0.4580
# Lags Used : 4
# Observations : 37
Critical Value (1%) : -3.6209
Critical Value (5%) : -2.9435
Critical Value (10%) : -2.6104
=> Fail to reject the null hypothesis. The series is non-stationary.
------------------------------------------------------------
ADF Test for PMI:
Test Statistic : -2.1896
p-value : 0.2100
# Lags Used : 0
# Observations : 43
Critical Value (1%) : -3.5925
Critical Value (5%) : -2.9315
Critical Value (10%) : -2.6041
=> Fail to reject the null hypothesis. The series is non-stationary.
------------------------------------------------------------
ADF Test for Personal Lending:
Test Statistic : -3.2917
p-value : 0.0153
# Lags Used : 0
# Observations : 43
Critical Value (1%) : -3.5925
Critical Value (5%) : -2.9315
Critical Value (10%) : -2.6041
=> Reject the null hypothesis. The series is stationary.
------------------------------------------------------------
ADF Test for Business Lending:
Test Statistic : -2.9727
p-value : 0.0375
# Lags Used : 0
# Observations : 43
Critical Value (1%) : -3.5925
Critical Value (5%) : -2.9315
Critical Value (10%) : -2.6041
=> Reject the null hypothesis. The series is stationary.
------------------------------------------------------------
ADF Test for Residential Sales Index:
Test Statistic : -1.4082
p-value : 0.5783
# Lags Used : 4
# Observations : 39
Critical Value (1%) : -3.6104
Critical Value (5%) : -2.9391
Critical Value (10%) : -2.6081
=> Fail to reject the null hypothesis. The series is non-stationary.
------------------------------------------------------------
ADF Test for Monthly Visitors:
Test Statistic : -7.3270
p-value : 0.0000
# Lags Used : 1
# Observations : 41
Critical Value (1%) : -3.6010
Critical Value (5%) : -2.9351
Critical Value (10%) : -2.6060
=> Reject the null hypothesis. The series is stationary.
------------------------------------------------------------
To determine how many times to difference these non-stationary series, I used the .ndiffs() function from the pmdarima package. This function estimates the minimum number of differences needed to make a series stationary, based on statistical unit root tests.
import pmdarima as pm
# Create a dictionary for the series that need to be differenced
datasets_to_diff = {
"Non-Oil GDP": ngdp,
"PMI": pmi_quarterly,
"Residential Sales Index": residential_log,
}
# Loop through each dataset and determine the required number of differences
for name, df in datasets_to_diff.items():
# Assuming the series is in the first column
series = df.iloc[:, 0].dropna()
d = pm.arima.ndiffs(series, test='adf') # 'adf' is the default test
print(f"{name} requires {d} difference(s) to achieve stationarity based on the ADF test.")Non-Oil GDP requires 1 difference(s) to achieve stationarity based on the ADF test.
PMI requires 1 difference(s) to achieve stationarity based on the ADF test.
Residential Sales Index requires 1 difference(s) to achieve stationarity based on the ADF test.
Based on this method, a first order difference was found to be sufficient for the three nonstationary data.
After applying the differencing, I reran the ADF test and confirmed that the series were now stationary.
# Created a dictionary mapping the series name to the number of differences needed.
diff_order = {
"Non-Oil GDP": 1,
"Residential Sales Index": 1,
"PMI": 1,
}
# Function to apply differencing d times on a Pandas Series
def difference_series(series, d):
diff_series = series.copy()
for i in range(d):
diff_series = diff_series.diff().dropna()
return diff_series
# Dictionary to store the differenced series
differenced_series = {}
# Apply differencing based on the required order for each dataset
for name, df in datasets_to_diff.items():
# Extract the series from the first column and drop missing values
series = df.iloc[:, 0].dropna()
d = diff_order[name]
if d > 0:
differenced_series[name] = difference_series(series, d)
else:
# If no differencing is needed, store the original series
differenced_series[name] = series
# Function to run the ADF test and print results like before
def adf_test(series, signif=0.05, name=''):
series = series.dropna()
result = adfuller(series, autolag='AIC')
test_statistic, p_value, usedlag, n_obs, crit_values, icbest = result
print(f"ADF Test for {name}:")
print(f" Test Statistic : {test_statistic:.4f}")
print(f" p-value : {p_value:.4f}")
print(f" # Lags Used : {usedlag}")
print(f" # Observations : {n_obs}")
for key, value in crit_values.items():
print(f" Critical Value ({key}) : {value:.4f}")
if p_value <= signif:
print(" => Reject the null hypothesis. The series is stationary.")
else:
print(" => Fail to reject the null hypothesis. The series is non-stationary.")
print("-" * 60)
# Run the ADF test on each differenced series
for name, series in differenced_series.items():
adf_test(series, name=name)ADF Test for Non-Oil GDP:
Test Statistic : -5.4970
p-value : 0.0000
# Lags Used : 3
# Observations : 37
Critical Value (1%) : -3.6209
Critical Value (5%) : -2.9435
Critical Value (10%) : -2.6104
=> Reject the null hypothesis. The series is stationary.
------------------------------------------------------------
ADF Test for PMI:
Test Statistic : -6.5476
p-value : 0.0000
# Lags Used : 0
# Observations : 42
Critical Value (1%) : -3.5966
Critical Value (5%) : -2.9333
Critical Value (10%) : -2.6050
=> Reject the null hypothesis. The series is stationary.
------------------------------------------------------------
ADF Test for Residential Sales Index:
Test Statistic : -5.0959
p-value : 0.0000
# Lags Used : 3
# Observations : 39
Critical Value (1%) : -3.6104
Critical Value (5%) : -2.9391
Critical Value (10%) : -2.6081
=> Reject the null hypothesis. The series is stationary.
------------------------------------------------------------
# Renaming the new differenced variables
ngdp_diff = differenced_series["Non-Oil GDP"].to_frame(name="Non-Oil GDP")
residential_diff = differenced_series["Residential Sales Index"].to_frame(name="Residential Sales Index")
pmi_diff = differenced_series["PMI"].to_frame(name="PMI")Granger Causality Analysis
After ensuring stationarity, I used the Granger causality test, a statistical method that determines whether past values of an independent variable help predict future values of a dependent variable beyond what is already explained by the dependent variable’s own past values (Granger, 1969).
Using an analogy again,
- Imagine you’re trying to predict whether people will carry umbrellas tomorrow.
- You have two pieces of historical information:
- Past Weather Data – Did it rain recently?
- Past Umbrella Usage – Did people carry umbrellas recently?
- Past Weather Data – Did it rain recently?
If knowing the history of rainfall improves your prediction of whether people will carry umbrellas tomorrow — beyond just looking at past umbrella usage — then we say that rain “Granger-causes” umbrella usage.
It’s important to note that Granger causality isn’t about real-world causation. It simply tests whether one variable’s history improves the prediction of another. In other words, it asks: “Does adding this information make my forecast better?”
You can have Granger causality without true cause-and-effect relationships in the real world.
From the results, looking at the ssr based F test p-value, it showed: - Residential Sales Price Index significantly Granger-causes GDP growth at shorter lags (1–3 quarters). - UAE PMI, tourism inflows, and personal lending demand were significant at lag 4, suggesting longer-term predictive power. - Business lending demand did not show statistically significant Granger causality at any lag.
These insights directly answered the question: “Does adding this information make my forecast better?” They helped identify which variables were worth including in the final VAR model, ensuring that only the most relevant predictors contributed to improving forecast accuracy.
from statsmodels.tsa.stattools import grangercausalitytests
def merge_on_date(series1, series2, col1_name, col2_name):
"""
Convert two series to DataFrames (if necessary) and merge them on the index (date) using an inner join.
Returns a DataFrame with columns [col1_name, col2_name].
"""
# Convert to DataFrame if not already one.
df1 = series1.to_frame(name=col1_name) if not isinstance(series1, pd.DataFrame) else series1.copy()
df2 = series2.to_frame(name=col2_name) if not isinstance(series2, pd.DataFrame) else series2.copy()
# Merge on the index (date) using an inner join.
merged_df = pd.merge(df1, df2, left_index=True, right_index=True, how='inner')
return merged_df
maxlag = 4 # Maximum lag to test
# Dictionary of indicators (ensure these series are aligned appropriately)
indicators = {
"Residential Sales Index": residential_diff,
"Visitors": visitors_log,
"PMI": pmi_diff,
"Personal": personal,
"Business": business,
}
print("Granger Causality Tests: GDP vs Indicators")
for indicator_name, indicator_series in indicators.items():
# Merge GDP_diff and the indicator by date
df_test = merge_on_date(ngdp_diff, indicator_series, "Non-Oil GDP", indicator_name)
n_obs = df_test.shape[0]
required_obs = 3 * maxlag + 1 # Minimum required observations (for lag 4, 13 observations)
print(f"\nTesting if {indicator_name} Granger-causes Non-Oil GDP:")
print(f"Number of observations: {n_obs} (required at least {required_obs})")
if n_obs < required_obs:
print(f"Skipping {indicator_name} due to insufficient observations.")
continue
grangercausalitytests(df_test, maxlag=maxlag, verbose=True)Granger Causality Tests: GDP vs Indicators
Testing if Residential Sales Index Granger-causes Non-Oil GDP:
Number of observations: 41 (required at least 13)
Granger Causality
number of lags (no zero) 1
ssr based F test: F=5.0774 , p=0.0303 , df_denom=37, df_num=1
ssr based chi2 test: chi2=5.4890 , p=0.0191 , df=1
likelihood ratio test: chi2=5.1437 , p=0.0233 , df=1
parameter F test: F=5.0774 , p=0.0303 , df_denom=37, df_num=1
Granger Causality
number of lags (no zero) 2
ssr based F test: F=3.4687 , p=0.0426 , df_denom=34, df_num=2
ssr based chi2 test: chi2=7.9577 , p=0.0187 , df=2
likelihood ratio test: chi2=7.2417 , p=0.0268 , df=2
parameter F test: F=3.4687 , p=0.0426 , df_denom=34, df_num=2
Granger Causality
number of lags (no zero) 3
ssr based F test: F=4.5753 , p=0.0091 , df_denom=31, df_num=3
ssr based chi2 test: chi2=16.8254 , p=0.0008 , df=3
likelihood ratio test: chi2=13.9296 , p=0.0030 , df=3
parameter F test: F=4.5753 , p=0.0091 , df_denom=31, df_num=3
Granger Causality
number of lags (no zero) 4
ssr based F test: F=2.3178 , p=0.0818 , df_denom=28, df_num=4
ssr based chi2 test: chi2=12.2514 , p=0.0156 , df=4
likelihood ratio test: chi2=10.5827 , p=0.0317 , df=4
parameter F test: F=2.3178 , p=0.0818 , df_denom=28, df_num=4
Testing if Visitors Granger-causes Non-Oil GDP:
Number of observations: 41 (required at least 13)
Granger Causality
number of lags (no zero) 1
ssr based F test: F=0.2794 , p=0.6002 , df_denom=37, df_num=1
ssr based chi2 test: chi2=0.3021 , p=0.5826 , df=1
likelihood ratio test: chi2=0.3010 , p=0.5833 , df=1
parameter F test: F=0.2794 , p=0.6002 , df_denom=37, df_num=1
Granger Causality
number of lags (no zero) 2
ssr based F test: F=0.5162 , p=0.6014 , df_denom=34, df_num=2
ssr based chi2 test: chi2=1.1843 , p=0.5531 , df=2
likelihood ratio test: chi2=1.1666 , p=0.5580 , df=2
parameter F test: F=0.5162 , p=0.6014 , df_denom=34, df_num=2
Granger Causality
number of lags (no zero) 3
ssr based F test: F=2.5294 , p=0.0754 , df_denom=31, df_num=3
ssr based chi2 test: chi2=9.3015 , p=0.0255 , df=3
likelihood ratio test: chi2=8.3203 , p=0.0398 , df=3
parameter F test: F=2.5294 , p=0.0754 , df_denom=31, df_num=3
Granger Causality
number of lags (no zero) 4
ssr based F test: F=3.3133 , p=0.0242 , df_denom=28, df_num=4
ssr based chi2 test: chi2=17.5132 , p=0.0015 , df=4
likelihood ratio test: chi2=14.3384 , p=0.0063 , df=4
parameter F test: F=3.3133 , p=0.0242 , df_denom=28, df_num=4
Testing if PMI Granger-causes Non-Oil GDP:
Number of observations: 41 (required at least 13)
Granger Causality
number of lags (no zero) 1
ssr based F test: F=0.5548 , p=0.4611 , df_denom=37, df_num=1
ssr based chi2 test: chi2=0.5998 , p=0.4387 , df=1
likelihood ratio test: chi2=0.5953 , p=0.4404 , df=1
parameter F test: F=0.5548 , p=0.4611 , df_denom=37, df_num=1
Granger Causality
number of lags (no zero) 2
ssr based F test: F=0.8240 , p=0.4473 , df_denom=34, df_num=2
ssr based chi2 test: chi2=1.8903 , p=0.3886 , df=2
likelihood ratio test: chi2=1.8459 , p=0.3974 , df=2
parameter F test: F=0.8240 , p=0.4473 , df_denom=34, df_num=2
Granger Causality
number of lags (no zero) 3
ssr based F test: F=2.4156 , p=0.0853 , df_denom=31, df_num=3
ssr based chi2 test: chi2=8.8832 , p=0.0309 , df=3
likelihood ratio test: chi2=7.9828 , p=0.0464 , df=3
parameter F test: F=2.4156 , p=0.0853 , df_denom=31, df_num=3
Granger Causality
number of lags (no zero) 4
ssr based F test: F=5.8537 , p=0.0015 , df_denom=28, df_num=4
ssr based chi2 test: chi2=30.9408 , p=0.0000 , df=4
likelihood ratio test: chi2=22.4856 , p=0.0002 , df=4
parameter F test: F=5.8537 , p=0.0015 , df_denom=28, df_num=4
Testing if Personal Granger-causes Non-Oil GDP:
Number of observations: 41 (required at least 13)
Granger Causality
number of lags (no zero) 1
ssr based F test: F=1.5025 , p=0.2280 , df_denom=37, df_num=1
ssr based chi2 test: chi2=1.6244 , p=0.2025 , df=1
likelihood ratio test: chi2=1.5923 , p=0.2070 , df=1
parameter F test: F=1.5025 , p=0.2280 , df_denom=37, df_num=1
Granger Causality
number of lags (no zero) 2
ssr based F test: F=2.7995 , p=0.0749 , df_denom=34, df_num=2
ssr based chi2 test: chi2=6.4224 , p=0.0403 , df=2
likelihood ratio test: chi2=5.9453 , p=0.0512 , df=2
parameter F test: F=2.7995 , p=0.0749 , df_denom=34, df_num=2
Granger Causality
number of lags (no zero) 3
ssr based F test: F=2.4942 , p=0.0783 , df_denom=31, df_num=3
ssr based chi2 test: chi2=9.1724 , p=0.0271 , df=3
likelihood ratio test: chi2=8.2164 , p=0.0417 , df=3
parameter F test: F=2.4942 , p=0.0783 , df_denom=31, df_num=3
Granger Causality
number of lags (no zero) 4
ssr based F test: F=3.2777 , p=0.0253 , df_denom=28, df_num=4
ssr based chi2 test: chi2=17.3250 , p=0.0017 , df=4
likelihood ratio test: chi2=14.2105 , p=0.0067 , df=4
parameter F test: F=3.2777 , p=0.0253 , df_denom=28, df_num=4
Testing if Business Granger-causes Non-Oil GDP:
Number of observations: 41 (required at least 13)
Granger Causality
number of lags (no zero) 1
ssr based F test: F=0.3368 , p=0.5652 , df_denom=37, df_num=1
ssr based chi2 test: chi2=0.3641 , p=0.5462 , df=1
likelihood ratio test: chi2=0.3625 , p=0.5471 , df=1
parameter F test: F=0.3368 , p=0.5652 , df_denom=37, df_num=1
Granger Causality
number of lags (no zero) 2
ssr based F test: F=1.6563 , p=0.2059 , df_denom=34, df_num=2
ssr based chi2 test: chi2=3.7998 , p=0.1496 , df=2
likelihood ratio test: chi2=3.6259 , p=0.1632 , df=2
parameter F test: F=1.6563 , p=0.2059 , df_denom=34, df_num=2
Granger Causality
number of lags (no zero) 3
ssr based F test: F=1.2499 , p=0.3086 , df_denom=31, df_num=3
ssr based chi2 test: chi2=4.5965 , p=0.2038 , df=3
likelihood ratio test: chi2=4.3390 , p=0.2271 , df=3
parameter F test: F=1.2499 , p=0.3086 , df_denom=31, df_num=3
Granger Causality
number of lags (no zero) 4
ssr based F test: F=0.5135 , p=0.7263 , df_denom=28, df_num=4
ssr based chi2 test: chi2=2.7143 , p=0.6067 , df=4
likelihood ratio test: chi2=2.6193 , p=0.6234 , df=4
parameter F test: F=0.5135 , p=0.7263 , df_denom=28, df_num=4
Vector Autoregression (VAR)
The Vector Autoregression (VAR) framework allows all variables in the system to be treated as endogenous, capturing both direct and indirect effects across multiple time lags. This makes it particularly well-suited for analyzing the dynamic interactions among the indicators and non-oil GDP growth in complex economic systems (Sims, 1980).
To understand this better, imagine a group of friends who constantly influence each other’s decisions.
- If one friend starts going to the gym, a few months later, another might pick up the habit.
- A third friend might start eating healthier in response to seeing both friends making positive lifestyle changes.
In this case, no one friend is the sole influencer they all react to and affect each other over time.
Similarly, in a VAR model, every variable is treated as both an influencer and a responder. It doesn’t assume that one variable is always the cause and the others are always the effect. Instead, it captures how they all interact and influence each other dynamically across different time lags.
I have used a similar model before called the ARIMA model, which is designed to forecast how a single variable (for my case was the stock price of $PG) changes over time by learning from its own past behavior. In contrast, the VAR model looks at how multiple variables influence each other over time, treating all of them as both causes and effects in the system.
Lag Order Selection
Before fitting the VAR model, I needed to choose how many past quarters (lags) to use. This is important because too few lags might miss delayed effects, while too many can overfit the data.
To stay consistent, I set the maximum lag to 4 in the lag selection process since it’s the highest lag I’ve been using throughout my earlier analysis (e.g., Granger causality). I then used the .select_order() function from the statsmodels library, which evaluates different lag lengths (from 1 to 4 in this case) and identifies the best lag order based on statistical scoring rules.
To decide, I used the information criteria which are statistical tools used for model selection that balance model fit and complexity by penalizing excessive parameters, helping to prevent overfitting, I used four common ones:
- Akaike Information Criterion (AIC)
- Bayesian Information Criterion (BIC)
- Final Prediction Error (FPE)
- Hannan-Quinn Criterion (HQIC)
You can think of it like four friends trying to pick a restaurant. They all have slightly different preferences, one wants cheap, another wants healthy, one likes variety, and another wants a short wait time. While they might not always agree, if all four pick the same place, it’s probably a solid choice.
In this case, all four criteria pointed to lag 4 as the best option. So I used 4 lags in the VAR model.
# Join all variables based on the datetime index
var_df = ngdp_diff.join([
residential_diff,
visitors_log,
personal,
pmi_diff
])
# Drop any rows with missing values
var_df.dropna(inplace=True)
from statsmodels.tsa.api import VAR
# Create VAR model object
model = VAR(var_df)
# Select optimal lag length
lag_selection = model.select_order(maxlags=4)
lag_selection.summary()| AIC | BIC | FPE | HQIC | |
| 0 | 0.9565 | 1.174 | 2.603 | 1.033 |
| 1 | 0.5895 | 1.896 | 1.829 | 1.050 |
| 2 | -0.3719 | 2.023 | 0.7563 | 0.4723 |
| 3 | -1.380 | 2.103 | 0.3410 | -0.1525 |
| 4 | -3.401* | 1.171* | 0.07155* | -1.789* |
VAR Results
Based on the Granger causality results, only variables that showed statistically significant predictive power for non-oil GDP at any lag length were included in the VAR model hence, Using four lags, The model included five endogenous variables and was estimated over 37 quarterly observations.
The VAR model generates a system of six equations, one for each variable in the system, but for my analysis, the focus was specifically on the Non-Oil GDP equation, as it was the primary variable of interest.
\[ \text{GDP}_t = \alpha + \sum_{i=1}^{4} \beta_i \text{GDP}_{t-i} + \sum_{i=1}^{4} \gamma_i \text{ResIndex}_{t-i} + \sum_{i=1}^{4} \delta_i \text{Visitors}_{t-i} + \sum_{i=1}^{4} \phi_i \text{Lending}_{t-i} + \sum_{i=1}^{4} \theta_i \text{PMI}_{t-i} + \varepsilon_t \]
Where: - \(\text{GDP}_t\) is Non-Oil GDP growth at time t
- \(\text{RESI}\) = Residential Sales Index
- \(\text{VIS}\) = Visitor Inflows
- \(\text{PLD}\) = Personal Lending Demand
- \(\text{PMI}\) = UAE Manufacturing PMI
- \(\varepsilon_t\) = error term
Understanding the VAR Equation for Non-Oil GDP
This equation is one part of a system of five equations in the VAR model. It describes how the value of Non-Oil GDP at time ( t ) is influenced by:
Its own past values : GDP from 1 to 4 periods ago
These are the terms like \(GDP_{t-1},\ GDP_{t-2},\ \dots\)Lagged values of the other variables:
- Residential Sales Index
- Visitors
- Personal Lending Demand
- PMI
Each of these variables enters the equation with 4 lags, meaning the model uses their values from the past four time periods to help predict current GDP.
The constant term ( \(\alpha\)): This is the baseline level of GDP if all other influences were zero (the intercept).
The error term ( \(\varepsilon_t\)): Captures any random shocks or unexplained changes in GDP.
We’re trying to predict today’s Non-Oil GDP using a combination of:
- What GDP was in the last 4 periods,
- What residential sales, visitors, lending, and PMI were in the last 4 periods,
In the Non-Oil GDP equation, most variables were not statistically significant at the 5% level, except for PMI at lag 4, which had a negative and significant effect (p = 0.023).
Other variables such as the Residential Sales Index, Personal Lending, and Visitor Numbers showed some larger coefficients and borderline significance at longer lags, but none passed the usual 5% threshold in the GDP equation.
Overall, the VAR results suggest that UAE Manufacturing PMI is the only reliable short-term predictor of non-oil GDP growth within this model. The other indicators may influence GDP more indirectly, with longer delays, or may be affected by structural factors outside the model’s scope.
\[ \text{GDP}_t = 0.027 - 0.002 \cdot \text{GDP}_{t-1} + 19.06 \cdot \text{ResIndex}_{t-1} + 0.17 \cdot \text{Visitors}_{t-1} + 0.10 \cdot \text{Lending}_{t-1} + 0.17 \cdot \text{PMI}_{t-1} + \dots + \varepsilon_t \]
(Full coefficients omitted for visual clarity. See table for all terms.)
# Fit the model at lag 4
results = model.fit(4)
# Print summary of the model
results.summary() Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Sun, 11, May, 2025
Time: 17:03:12
--------------------------------------------------------------------
No. of Equations: 5.00000 BIC: 1.17060
Nobs: 37.0000 HQIC: -1.78924
Log likelihood: -94.5866 FPE: 0.0715525
AIC: -3.40092 Det(Omega_mle): 0.00755951
--------------------------------------------------------------------
Results for equation Non-Oil GDP
=============================================================================================
coefficient std. error t-stat prob
---------------------------------------------------------------------------------------------
const 0.027202 0.826879 0.033 0.974
L1.Non-Oil GDP -0.002335 0.262790 -0.009 0.993
L1.Residential Sales Index 19.063571 57.052020 0.334 0.738
L1.Visitors 0.172285 0.333261 0.517 0.605
L1.Personal Lending Demand 0.097513 0.074224 1.314 0.189
L1.PMI 0.171908 0.252630 0.680 0.496
L2.Non-Oil GDP -0.057310 0.172958 -0.331 0.740
L2.Residential Sales Index -68.315618 64.140023 -1.065 0.287
L2.Visitors 0.179809 0.428840 0.419 0.675
L2.Personal Lending Demand -0.040749 0.070511 -0.578 0.563
L2.PMI 0.375576 0.260185 1.443 0.149
L3.Non-Oil GDP 0.096099 0.179300 0.536 0.592
L3.Residential Sales Index -74.070909 51.241778 -1.446 0.148
L3.Visitors 0.314494 0.423607 0.742 0.458
L3.Personal Lending Demand 0.043716 0.082972 0.527 0.598
L3.PMI 0.160643 0.341211 0.471 0.638
L4.Non-Oil GDP -0.142234 0.177412 -0.802 0.423
L4.Residential Sales Index 61.063097 43.303181 1.410 0.159
L4.Visitors -0.018911 0.307582 -0.061 0.951
L4.Personal Lending Demand -0.116549 0.076646 -1.521 0.128
L4.PMI -0.978764 0.431754 -2.267 0.023
=============================================================================================
Results for equation Residential Sales Index
=============================================================================================
coefficient std. error t-stat prob
---------------------------------------------------------------------------------------------
const 0.001816 0.003014 0.603 0.547
L1.Non-Oil GDP 0.001864 0.000958 1.946 0.052
L1.Residential Sales Index -0.323274 0.207977 -1.554 0.120
L1.Visitors -0.000000 0.001215 -0.000 1.000
L1.Personal Lending Demand 0.000073 0.000271 0.270 0.787
L1.PMI 0.000650 0.000921 0.706 0.480
L2.Non-Oil GDP 0.001275 0.000631 2.022 0.043
L2.Residential Sales Index 0.140876 0.233816 0.603 0.547
L2.Visitors 0.001011 0.001563 0.647 0.518
L2.Personal Lending Demand -0.000484 0.000257 -1.884 0.060
L2.PMI 0.000180 0.000948 0.190 0.850
L3.Non-Oil GDP 0.001302 0.000654 1.992 0.046
L3.Residential Sales Index 0.164991 0.186797 0.883 0.377
L3.Visitors -0.000285 0.001544 -0.184 0.854
L3.Personal Lending Demand 0.000180 0.000302 0.596 0.551
L3.PMI -0.002906 0.001244 -2.336 0.019
L4.Non-Oil GDP -0.000702 0.000647 -1.085 0.278
L4.Residential Sales Index -0.420014 0.157857 -2.661 0.008
L4.Visitors -0.000342 0.001121 -0.305 0.761
L4.Personal Lending Demand 0.000126 0.000279 0.449 0.653
L4.PMI 0.000006 0.001574 0.004 0.997
=============================================================================================
Results for equation Visitors
=============================================================================================
coefficient std. error t-stat prob
---------------------------------------------------------------------------------------------
const 0.148540 0.745333 0.199 0.842
L1.Non-Oil GDP 0.288083 0.236874 1.216 0.224
L1.Residential Sales Index -51.840243 51.425610 -1.008 0.313
L1.Visitors -1.121010 0.300395 -3.732 0.000
L1.Personal Lending Demand 0.055358 0.066904 0.827 0.408
L1.PMI 0.385779 0.227716 1.694 0.090
L2.Non-Oil GDP 0.103947 0.155901 0.667 0.505
L2.Residential Sales Index 0.437067 57.814602 0.008 0.994
L2.Visitors -0.878064 0.386548 -2.272 0.023
L2.Personal Lending Demand -0.067085 0.063558 -1.056 0.291
L2.PMI 0.612824 0.234526 2.613 0.009
L3.Non-Oil GDP -0.076189 0.161617 -0.471 0.637
L3.Residential Sales Index -49.012700 46.188368 -1.061 0.289
L3.Visitors -0.539115 0.381831 -1.412 0.158
L3.Personal Lending Demand -0.014622 0.074789 -0.196 0.845
L3.PMI 1.025898 0.307561 3.336 0.001
L4.Non-Oil GDP 0.028380 0.159916 0.177 0.859
L4.Residential Sales Index -5.494405 39.032667 -0.141 0.888
L4.Visitors -0.230364 0.277249 -0.831 0.406
L4.Personal Lending Demand 0.026006 0.069088 0.376 0.707
L4.PMI -0.475725 0.389175 -1.222 0.222
=============================================================================================
Results for equation Personal Lending Demand
=============================================================================================
coefficient std. error t-stat prob
---------------------------------------------------------------------------------------------
const 0.579003 2.556480 0.226 0.821
L1.Non-Oil GDP 0.549041 0.812472 0.676 0.499
L1.Residential Sales Index 162.945998 176.389056 0.924 0.356
L1.Visitors 0.209060 1.030350 0.203 0.839
L1.Personal Lending Demand 0.528909 0.229481 2.305 0.021
L1.PMI 0.701368 0.781063 0.898 0.369
L2.Non-Oil GDP -0.673903 0.534738 -1.260 0.208
L2.Residential Sales Index -42.973955 198.303200 -0.217 0.828
L2.Visitors 0.291074 1.325854 0.220 0.826
L2.Personal Lending Demand 0.071974 0.218002 0.330 0.741
L2.PMI 2.682019 0.804421 3.334 0.001
L3.Non-Oil GDP 0.444805 0.554345 0.802 0.422
L3.Residential Sales Index 103.337438 158.425395 0.652 0.514
L3.Visitors 0.334854 1.309676 0.256 0.798
L3.Personal Lending Demand 0.008650 0.256526 0.034 0.973
L3.PMI -1.593701 1.054929 -1.511 0.131
L4.Non-Oil GDP -1.252175 0.548509 -2.283 0.022
L4.Residential Sales Index -120.642043 133.881450 -0.901 0.368
L4.Visitors 0.922064 0.950959 0.970 0.332
L4.Personal Lending Demand 0.416187 0.236969 1.756 0.079
L4.PMI -1.760560 1.334865 -1.319 0.187
=============================================================================================
Results for equation PMI
=============================================================================================
coefficient std. error t-stat prob
---------------------------------------------------------------------------------------------
const 0.136332 0.844292 0.161 0.872
L1.Non-Oil GDP -0.012271 0.268324 -0.046 0.964
L1.Residential Sales Index -24.650173 58.253477 -0.423 0.672
L1.Visitors -0.125821 0.340279 -0.370 0.712
L1.Personal Lending Demand 0.116343 0.075788 1.535 0.125
L1.PMI 0.073901 0.257950 0.286 0.775
L2.Non-Oil GDP 0.038020 0.176600 0.215 0.830
L2.Residential Sales Index -34.755131 65.490746 -0.531 0.596
L2.Visitors -0.196072 0.437871 -0.448 0.654
L2.Personal Lending Demand -0.065681 0.071996 -0.912 0.362
L2.PMI -0.014077 0.265665 -0.053 0.958
L3.Non-Oil GDP 0.028025 0.183076 0.153 0.878
L3.Residential Sales Index 4.168953 52.320877 0.080 0.936
L3.Visitors -0.020181 0.432528 -0.047 0.963
L3.Personal Lending Demand 0.068121 0.084719 0.804 0.421
L3.PMI -0.138161 0.348396 -0.397 0.692
L4.Non-Oil GDP -0.032776 0.181148 -0.181 0.856
L4.Residential Sales Index 62.009461 44.215101 1.402 0.161
L4.Visitors 0.129284 0.314059 0.412 0.681
L4.Personal Lending Demand -0.144540 0.078260 -1.847 0.065
L4.PMI -0.155714 0.440847 -0.353 0.724
=============================================================================================
Correlation matrix of residuals
Non-Oil GDP Residential Sales Index Visitors Personal Lending Demand PMI
Non-Oil GDP 1.000000 0.060753 0.761382 0.093986 0.211417
Residential Sales Index 0.060753 1.000000 -0.424514 0.334433 0.383702
Visitors 0.761382 -0.424514 1.000000 -0.127675 0.110087
Personal Lending Demand 0.093986 0.334433 -0.127675 1.000000 0.138265
PMI 0.211417 0.383702 0.110087 0.138265 1.000000
from statsmodels.stats.stattools import jarque_bera
# DataFrame containing residuals for each VAR equation
residuals = results.resid
# Run the Jarque-Bera normality test on the GDP residuals
jb_stat, jb_pvalue, skew, kurtosis = jarque_bera(residuals['Non-Oil GDP'])
# Print results
print(f"Jarque-Bera Statistic: {jb_stat:.3f}")
print(f"P-value: {jb_pvalue:.3f}")
print(f"Skewness: {skew:.3f}")
print(f"Kurtosis: {kurtosis:.3f}")
# Interpretation
if jb_pvalue > 0.05:
print("Residuals appear normally distributed (fail to reject H0).")
else:
print("Residuals do not appear normally distributed (reject H0).")
Jarque-Bera Statistic: 0.880
P-value: 0.644
Skewness: -0.264
Kurtosis: 2.459
Residuals appear normally distributed (fail to reject H0).
Model Evaluation
To fairly evaluate the VAR model’s forecasting performance and compare it with the earlier OLS model, I first needed to reverse the differencing process. Since the VAR was estimated on differenced data to ensure stationarity, its predictions are also in “changes” rather than actual GDP growth values.
To get meaningful evaluation metrics (like MAE or RMSE), I reconstructed the predicted levels of GDP by adding back the previous values essentially “undifferencing” the series. This allowed me to compare the predicted GDP growth values from the VAR model against the actual GDP growth rates in their original form. as shown in similar workflows like Kale (2020).
Evaluation Metrics (VAR Model)
- Mean Absolute Error (MAE): 2.68
- Root Mean Squared Error (RMSE): 3.15
- Mean Absolute Percentage Error (MAPE): 182.2%
- R-squared (R²): 0.480
The R² is comparable to the OLS model, and the slightly higher errors (MAE, RMSE, MAPE). The model didn’t perform strongly. It explained about 0.478 which is around 48% of the changes in non-oil GDP based on the R-squared value, the same as the linear model but with a much higher MAPE (182.2%) indicating limited accuracy.
The OLS model seemed to be the better option between the two.
from sklearn.metrics import r2_score
original_gdp = ngdp.iloc[:, 0].dropna().values
predicted_diff = results.fittedvalues['Non-Oil GDP'].values
# initial value: first value after the lag period
initial_value = original_gdp[results.k_ar] # not results.k_ar - 1
# Reconstruct original GDP values from predicted diffs
reconstructed = [initial_value]
for diff in predicted_diff:
reconstructed.append(reconstructed[-1] + diff)
# Drop first value to align with predicted_diff length
reconstructed = reconstructed[1:]
# Actual GDP aligned to match predicted diff length
actual_aligned = original_gdp[results.k_ar + 1:] # skip one more to match length
time_index = np.arange(len(actual_aligned))
# Calculate evaluation metrics
var_mae = mean_absolute_error(actual_aligned, reconstructed)
var_rmse = mean_squared_error(actual_aligned, reconstructed, squared=False)
var_mape = np.mean(np.abs((actual_aligned - reconstructed) / actual_aligned)) * 100
var_r2 = r2_score(actual_aligned, reconstructed)
print(f"Mean Absolute Error (MAE): {var_mae:.2f}")
print(f"Root Mean Squared Error (RMSE): {var_rmse:.2f}")
print(f"Mean Absolute Percentage Error (MAPE): {var_mape:.1f}%")
print(f"R-squared (R²): {var_r2:.3f}")Mean Absolute Error (MAE): 2.68
Root Mean Squared Error (RMSE): 3.15
Mean Absolute Percentage Error (MAPE): 182.2%
R-squared (R²): 0.480
dates = pd.date_range(start='2014-01-01', periods=len(actual_aligned), freq='Q')
# Plot
fig = go.Figure()
fig.add_trace(go.Scatter(
x=dates, y=actual_aligned,
mode='lines+markers',
name='Actual GDP Growth',
line=dict(color='#FF4136', width=2),
marker=dict(color='#FF4136', size=5)
))
fig.add_trace(go.Scatter(
x=dates, y=reconstructed,
mode='lines+markers',
name='Predicted GDP Growth (VAR)',
line=dict(color='green', width=2),
marker=dict(size=6)
))
# Shaded area between actual and predicted
fig.add_trace(go.Scatter(
x=np.concatenate([dates, dates[::-1]]),
y=np.concatenate([actual_aligned, reconstructed[::-1]]),
fill='toself',
fillcolor='rgba(144, 238, 144, 0.3)', # LightGreen with transparency
line=dict(color='rgba(255,255,255,0)'),
hoverinfo='skip',
name='Prediction Error',
showlegend=True,
))
# Layout styling (black background, white text)
fig.update_layout(
height=500, width=900,
margin=dict(l=50, r=40, t=40, b=40),
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
xaxis_title='Date',
yaxis_title='QoQ Non-Oil GDP Growth (%)',
legend=dict(orientation='h', y=1.05, x=0.5, xanchor='center', yanchor='bottom')
)
# Axes
fig.update_yaxes(showgrid=False, zeroline=False, gridcolor='lightgray', dtick=2)
fig.update_xaxes(showgrid=False)
fig.show()Impulse Response Analysis
Even though the VAR model had its limitations its still provided useful tools like the Impulse Response Functions (IRFs).which is a great way to see what happens to one variable when another suddenly changes and watching how it reacts over time.
Think of it like tapping a glass of water. If you give it a quick tap, ripples form and gradually settle. The size of the ripples and how long they last tell you something about the stability and sensitivity of the water.
In the same way, an Impulse Response Function (IRF) shows how one variable reacts over time when another variable experiences a sudden “shock.” It helps visualize both the immediate impact and how long the effects persist before the system returns to normal.
In this case, I wanted to see how non-oil GDP responds to sudden changes in PMI, lending demand, real estate, and tourism while holding everything else constant.
This is especially useful in real-world scenarios, like when a new policy, interest rate shift, or external event causes a sudden movement in one sector. IRFs help reveal: - How strong the reaction is - Whether the effect is positive or negative - How long the impact lasts
For a country like the UAE, where the non-oil economy is growing fast but influenced by many moving parts, this kind of analysis helps us understand which sectors have the biggest short-term impact on growth.
Results
The impulse response results showed that shocks to the UAE Manufacturing PMI had the strongest and most immediate impact on non-oil GDP, peaking around the second and third quarters.
This suggests that rising business confidence leads to short-term economic expansion, but the effect reverses later, possibly due to over-optimism
Real estate prices, personal lending demand, and tourism inflows also produced early positive effects on GDP. Although not as prominent as the PMI, However, these impacts were also shortlived, generally fading by the fourth quarter. This means that while sectors like real estate and tourism can boost the economy temporarily, they do not sustain momentum beyond one year.
Overall, these responses confirm that these indicators influence GDP within the first two to three quarters, which supports the idea that they could potentially be useful for short-term economic forecasting.
# IRF setup
irf = results.irf(4) # 4 periods ahead
irf_data = irf.orth_irfs
variables = results.names
response_var = 'Non-Oil GDP'
response_idx = variables.index(response_var)
n_steps = irf_data.shape[0]
lags = list(range(n_steps))
# Shock variables (exclude GDP itself)
shock_vars = [v for v in variables if v != response_var]
colors = px.colors.qualitative.Bold[:len(shock_vars)]
# Create figure
fig = go.Figure()
# Add line for each shock variable
for i, shock in enumerate(shock_vars):
shock_idx = variables.index(shock)
response = irf_data[:, response_idx, shock_idx]
fig.add_trace(go.Scatter(
x=lags,
y=response,
mode='lines+markers',
name=f'Shock in {shock}',
line=dict(color=colors[i], width=2),
marker=dict(size=5)
))
# Add zero line
fig.add_hline(y=0, line_dash="dash", line_color="gray")
# Styling
fig.update_layout(
xaxis_title="Quarters After Shock",
yaxis_title="non Oil GDP Response",
plot_bgcolor='black',
paper_bgcolor='black',
font=dict(color='white'),
showlegend=True,
legend=dict(
orientation="h",
yanchor="bottom",
y=1.05,
xanchor="center",
x=0.5
),
height=500,
width=800,
margin=dict(t=60)
)
fig.update_xaxes(showgrid=False, zeroline=False, showline=False, linecolor='black', tickcolor='black',dtick=1)
fig.update_yaxes(showgrid=False,gridcolor='lightgrey', zeroline=False, showline=False, linecolor='black', tickcolor='black', dtick=0.2 )
fig.show()Conclusion
While the forecasting models used in this study weren’t perfect, tools like correlation analysis and Impulse Response Functions (IRFs) still offered meaningful insights. They highlighted some early signals of predictive potential in these indicators even if the overall model accuracy was limited.
Looking ahead, future research could explore combining these variables with more traditional macroeconomic indicators. A mixed approach like this could help build more complete and accurate forecasting models.
Machine learning techniques also present an exciting opportunity. Although the current dataset is relatively small, as more data becomes available over time, models like random forests could uncover, non-linear relationships that most econometric models might miss. More historical data generally makes these models smarter and more reliable.
Finally, there is significant potential to develop a UAE-specific Leading Economic Index built on these kind of indicators. Such an index could serve as an early warning system for policymakers, investors, and businesses providing near real-time insights for informed decision-making. Similar to the effort made by El Mahmah (2017) using conventional indicators. (Central Bank of UAE, 2017).
References
McCloskey, J., & Remor, A., 2025.
Bentour, Y., & Fund, R., 2022. Macroeconomic Forecasting in Oil-Exporting Economies: The Case of UAE. ResearchGate. Link
Cherif, R., Hasanov, F., & Zhu, M., 2011. Breaking the Oil Spell: The Gulf Falcons’ Path to Diversification. IMF. Link
El Mahmah, M.A., 2017. CONSTRUCTING AN ECONOMIC COMPOSITE INDICATOR FOR THE UAE. Central Bank of the United Arab Emirates. Available at: https://www.centralbank.ae/media/sznddngl/wp19062017.pdf
Hann, R.N., Li, C., and Ogneva, M., 2017. Aggregate Earnings and Their Relation to Macroeconomic Activity: A Labor Market Perspective. SSRN Working Paper. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2993654.
Wooldridge, J.M., 2013. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western Cengage Learning.
Kale, A., 2020. Vector Autoregression (VAR) – Comprehensive Guide with Examples in Python. [online] MachineLearningPlus. Available at: https://www.machinelearningplus.com/time-series/vector-autoregression-examples-python/
Bentour, E.M. & Fund, A.M., 2022. The Role of Oil Prices in Forecasting Economic Growth in Oil Exporting Countries: Evidence from the Kingdom of Saudi Arabia and the United Arab Emirates. Available at: https://www.researchgate.net/publication/358039803
Magazzino, C., 2016. The relationship between real GDP, CO₂ emissions, and energy use in the GCC countries: A time series approach. Cogent Economics & Finance, 4(1). Available at: https://doi.org/10.1080/23322039.2016.1152729
Kireyev, A., 2000. Comparative Macroeconomic Dynamics in the Arab World: A Panel VAR Approach. SSRN. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=879461
Granger, C.W.J., 1969. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), pp.424-438. Available at: http://www.econ.uiuc.edu/~econ536/Papers/granger69.pdf.
Sims, C.A., 1980. Macroeconomics and reality. Econometrica, 48(1), pp.1-48. Available at: https://www.jstor.org/stable/1912017.
McCloskey, P.J. & Remor, R.M.,2025. Comparative Analysis of ARIMA, VAR, and Linear Regression Models for UAE GDP Forecasting. Emirati Journal of Business, Economics & Social Studies, 4(1), pp. 23–33. Available at: https://www.emiratesscholar.com/system/publish/070325040313116.pdf